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Abstract 

Two very fast and simple O(lgn) iteration algorithms for individual Fi- 
bonacci numbers are given and compared to competing algorithms. A simple 
O(lgn) recursion is derived that can also be applied to Lucas. A formula is 
given to estimate the largest n, where F n does not overflow the implemen- 
tation's data type. The danger of timing runs on input that is too large for 
the computer representation leads to false research results. 
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1. Introduction 

The determination of individual Fibonacci numbers, first defined in 1202, 
has an interesting and extensive literature [1,2]. The recursive definition 
F n = F n -i + F n - 2 leads naturally to iteration. Defining F as incorporates 
two common statements of initial conditions by starting at Fq or F\ and this 
fixes the value of F n , which varies with different authors. Lucas numbers 
are generated by using initial conditions 1,3 and it is convenient to define 
L = 2. [6] 

There are a number of direct iterative solutions for this recursive definition 
that are 0(n) [6]. DeMoivre published a closed formula in 1730 that requires 
n multiplications [1]. F n was first shown to be found in O(lgn) time in 
1978 [3] followed by improved algorithms [4,5], but the best method [5] is 
complicated and only becomes theoretically effective for extremely large n. 



Email address: lfjsde@gmail.com () 



Preprint submitted to arXiv 



November 2, 2010 



The Fibonacci recursion formula can have other initial values. It is easy 
to show by induction that for C n = H n -\ + C n -2 , where Co and C± are 
non-negative integers, C n = L\ * F n + £ * F n -\. So all numbers C n can be 
found in O(lgn) time by these methods. 

An experimental comparison of algorithms in [4] did not include the algo- 
rithms in this paper, but did include the closed form DeMoivre formula as a 
linear method and derived from it a O(lgn) recursive method they called the 
Binet algorithm: F n = \F\ 2 x \/5], when n = 2 m . Because the derivation 
assumes n = 2 m , Binet does not work exactly for all n. This is a nice and 
interesting recursive result but only exact for a subset of n that when when 
n = 2 m , which uses two multiplications and the ceiling function in each call. 

fib(n) 

if n = 1 or n = 2 return 1 
else return \(fib{n/2)) 2 \fh ] 

{\h]Fig2 Recursive Binet approximation Cull & Holloway) 

Using Lucas sequences [3], they [4] also constructed an efficient O(lgn) 
algorithm, named here Cullhow, requiring only two multiplications in a loop 
step; this they claimed to be the best in theoretical and experimental com- 
parisons of their algorithms. As expressed, Cullhow assumed n = 2 m , but the 
authors were aware of how to extend it to all n, and this was done by an other 
author in [5]. This algorithm is our base for comparsion and is given in 1.1. 
These comparisons included the size of F n , which grows as 0(F n ); thus, this 
execution cost is much higher than the simple iteration cost (which assumes 
that the cost of arithmetic is constant.) However, the number of bits required 
for computer representation is only [\gF n + l, so the computer execution cost 
of arithmetic is actually based on 0(lgF n ). 
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1.1. Comparsion Algorithm by Takakaski [5] 



fib(n) 
if n = return 
else if n = 1 return 1 
else if n = 2 return 1 
eZse 

/<-l 
Z <- 1 

sign <; 1 

mas A; <- 2^2 
/or « = 1 to Ll°g2 n — 1 
temp <r- f * f 

/<-(/ + 0/2 

f <— 2 * (/*/) — 3* temp — 2 * sign 
/ ^— 5 * temp + 2 * sign 
sign 1 

& mask) ^ 
temp / 

/<-(/ + 0/2 
/ / + 2 * temp 

sign i 1 

mask <— mask/2 

if(n & mask) = 

/<-/** 

e/se 

/<-(/ + 0/2 
f <— f * I — sign 
return f 

([5]Fig3 Presented product of Lucas numbers to compute F n for arbitrary n) 
2. Algorithms 

We present two effective representations of applying the DeMoive formula 
that have not appeared in other papers and give a new integer algorithm that 
uses only two multiplications in a novel way which does not require the use 
of Lucas numbers [5]. 
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2.1. Golden, a real number iteration process, and Rg olden, its recursion 

The DeMoive closed form F n = ^f- can be replaced by F n = [(^ -0.5) 

[1]. Based on efficiently computed powers of the golden ratio, = ^ 1+ 2 V ^ , the 
Algorithm Golden is not only O(lgn) Vn > but uses fewer multiplications 
than other methods. It can also find Lucas numbers directly in O(lgn) time 
because L n = <p n — 0™, where = . In the following discusion, we 

assume that exists as a constant just as does ir. 

Algorithm : Golden(n) given n > 0, return F n 
gi<-<j> 

if(odd(n)) then F <— else F <— 1 
i <— n 

while{i > 1) 

{ % = i/2; gi <— gi * gi; if(odd(i))F «— gi * F} 
return |~(^ — 0.5) 

What is the number of odd divisors in a number n? It is at least 1 and 
at most iV = |_lg n. It is easy to show that the number of odd divisors tends 
to (iV + l)/2. Thus, there are on average about 1.5 multiplications executed 
by the loop body. For all n, if k is the number of multiplications per loop 
iteration, then 1 < k < 2 compared to other methods where k > 2. The 
loop has |_lg ?7, + 1 iterations and Golden is O(lgn) in the number of loop 
executions. 

For n = 2 m , only one multiplication is required in a loop step compared 
to the two multiplications for Binet and the two multiplications for Cullhow. 
There are two practical difficulties: at least full size multiplication is required 
in each loop iteration (otherwise a size adjusting process is required), and 
error can occur due to finite approximation of irrational numbers. Integer 
methods avoid both of these difficulties. 

This can be converted to a simple O(lgn) recursion, with Rgolden calling 
the recursive procedure Rgold. Our recursion, compared to that for powers 
by Rawlins [7], uses one less variable and 3 less assignments by adding an 
else. A small revision gives a recursive solution for Lucas. 

Procedure : Rgold{n) given n > 1, return <p n 
if(n = 1) then return (0) 

if(odd(n)) then return (0 * Rgold(^) 2 ) else return (Rgold(^) 2 ) 
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Algorithm : Rgolden(n) given n > 0, return F n 

if( n < 1) return in) 
return ^ Mp/g) _ 0>5) 

2.2. Integer 0(\gn) Algorithms 

In [4] Cullhow, using a product of Lucas numbers [3,5] was claimed to 
be the best. It used one square and one conventional multiplication but 
was only defined for n = 2 m , techniques to extend to all n are well known. 
In [5], the idea of using Lucas numbers was extended from n = 2 m to all 
n and introduced the idea of using only two squares. This author claimed 
that performing squares using FFT based multiplication would result in a 
faster execution than the one multiplication one square method when n was 
sufficiently large. As can be seen in section 1.1 the resulting algorithm is 
rather complicated and the referred FFT based multiplication is represented 
by *, clearly, the algorithm does not implement FFT based multiplication 
and n would need to be very large indeed to attain the theoretical improve- 
ment. The method we use latter can also use FFT based multiplication so 
comparsions need only be made on the representations in this paper. 

We give an algorithm where the central calculation for doubling is derived 
as follows. Using the well known identities [1], F n+m = F m * F n+i + F m _i* F n 
and F n+ i * F n _i = F% + (— l) n , it is an exercise to derive: 

F 2k +i = F k+i + F l 

F 2k -i = Fk + F l-i 

F<ik = F 2 k+i — F 2 k-i 

This can be executed in three squares by rearrangement. The follow- 
ing doubling calculation that uses two multiplications is more efficient and 
is derived as follows. First note that when m = n an equation with two 
multiplications can be factored to give one multiplication. 

F n +m = F n+n = F 2n = F n * (F n+ i + 

This gives a term for F 2k but as seen above neither F 2k+1 nor F 2k -\ factor. 
Trying the term F 2n _ 2 , for m = n - 2 gives F n+{n _ 2) = F n _ 2 * F n+1 + F n _ 3 * 
F n , which does not factor and would require five adjacent terms. Consider, 
F 2n - 2 = F n+ „_i_i = F( n _i) + ( n _i)) = F n _i * (F n + F„_ 2 ). 
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We now have four adjacent terms that determine their doubling equations. 



F 2k -2 = F fc _! * (F k + F k _ 2 ) 

F 2k = F k *(F k+1 + F k _ 1 ) 

F 2 k-i = F 2k — F 2 fc_2 

F 2k+ \ = F 2k + F 2k -i 

This second doubling method requires four adjacent sequence terms and 
is not related to 2 x 2 matrix methods, which only use three adjacent sequence 
terms [3,4]. Rather than use F n+m to update as in the odd(i) step of Golden( 
which is complex), simply shift the four sequence terms one step forward 
as required. When to do this is determined by the odd divisors of n in 
reverse. Reference to Rgolden will make this clear. The improvement in [5] 
over Cullhow was to replace the multiplications with two squares, assuming 
an efficient FFT based method of computing squares and to extend it to all 
n. It was estimated that S(n) = |M(n). This results in a tie for our two sets 
of equations. Using Occam's Razor, we claim our two multiplication method 
is better for computer implementation, where the FFT speedup is not really 
practical. 

2.3. Alternate, an integer iteration 

The logarithmic power method for all n used in Golden was not efficient 
when applied to our formulas and was replaced with one based on the Dgolden 
recursion. The doubled sequence of four adjacent terms is shifted up one 
position as necessary. For simplicity, the sequence variables are renamed as 
follows: FLL = F fc _ 2 , FL = F k _ u FM = F k , FH = F k+1 
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Algorithm : Alternate(n) given n > 0, return FM 4— F n 
N 4- [\gn 

array markOdd[N] 4— 
i 4- n; j 4- N 

while(j > 0){if(odd(i)) markOdd(j) = l;i = i/2;j — j — 1} 
FLL 4- 1; FL 4- 0; FM 4- 1; FH 4- 1 
j=<-1 
while(j < N) 

FLL 4- FL* (FM + FLL) 

FM 4- FM * (FH + FL) 

FL 4- FM - FLL 

if(markOdd(j)){FLL 4- FL; FL 4- FM; FM 4-FL + FLL } 
FH 4- FM + FL 

3^3 + 1; 

endwhile 
return FM 

Moving the FH update to below the if reduces the cost of shift updating. 
Two multiplications and four to five additions in the iterative loop make this 
very fast compared to [5]. No pre conditions nor post conditions are used, as 
were required in [5]. All general methods need some kind of odd(i) selected 
calculation, when n is not a power of 2, and this has been reduced here to the 
insertion of a shift forward of two sequence terms and one addition. Thus, 
Alternate is competitive with the two best to date for when squaring is faster 
than multiplication [5] for integer arithmetic, and we claim the algorithm to 
be simpler and easier to understand. 

3. Execution Analysis 

Computer execution is affected by the growth of the number of storage 
bits, 7]k, required to represent and manipulate Fk as k approaches n. Previ- 
ous papers did not consider the maximum Fibonacci number that could be 
calculated by a program. 

l] n = Llg^n + 1 

Vn= Llg([(^-0.5) + l 
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r] n ^ L(™lg0-O.51g5) + 1 
?? n + 0.51g5-l 

Umax ~ I lg ^ 

For a given 77„,we can now estimate the largest index n for a Fibonacci 
number that can be represented by a data type (and, conversely, estimate 
the bits required to store F n for a givenn.) Given 128 bit signed integer 
representation, the estimated maximum value would be n = 184. For input 
n = 2 8 , a 177 bit word is estimated. Table 1 shows some estimates and actual 
values for programs 



Table 1: For signed Type size rj n Max n values 



Vn 


Type 




AC 


Gold 


24 


32 float 


34 


na 


30 


31 


32 int 


45 


46 


36 


53 


64 real 


76 


na 


74 


63 


64 long 


91 


92 


na 


90,995 


integer 


n= 2 17 







The immediate objection to Golden is the use of irrational numbers and 
the possibility of computer error due to truncation. Using 64 bit long and 64 
bit double in Java, Golden failed at F75, differing by 1 in the last digit from 
AC, which failed at F93 from overflow. Note that a double has fewer digits 
than a long of the same computer bit size representation, so part of this failure 
is a practical limitation due to the representation of reals in the computer. 
In Table 1, assuming no truncation error, the predicted maximum with an 
effective mantissa of 53 bits is at n = 76, the actual maximum, including 
truncation error, is at n = 74. Assuming additional space, Golden works 
correctly for any n. 

To examine truncation error for <fi and these were truncated in the 
programs to 9 places, representing 31 bit integers. The truncated Golden 
failed at F37 differing by 1 in the last digit from a 32 bit int version of 
Alternating, which failed at F47 from overflow. For 32 bit floating point 
Golden failed at F31 differing by 1. This truncation error explains why the 
estimates h max for n max are different than the actual values found. 

A secondary objection to Golden is that full multiplication of size r\ n is 
required in each iteration compared to the integer methods that only re- 
quire size multiplication rjk for loop iteration k. Assuming that the cost of 
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multiplying numbers with k positions is k 2 (the best case for Golden with 
a single multiplication), the average cost of multiplication for Golden is the 
full size cost Mq = Vn- Assuming n = 2 h , the average cost of multipli- 
cation for Alternating with two multiplications is Ma = |Xli=i^ 2 - Thus, 
~M a ~ I'M a which means that Alternate has lower total multiplication costs 
than Golden, when the size of the multiplication is taken into account. If 
the cost of multiplication only depends on the register size, then Golden is 
better. 

4. Experiments 

In [4] experiments were run only on powers of two from n = 2 8 up to 
n = 2 17 . Our analysis indicates that these runs were on overflow values 
(or used an extended representation via sofware) and so did not measure 
the real register cost of computer multiplication and addition. Because of 
its simplicity, the timing of Binet should have been much faster. There 
are a number of possible reasons that would explain why Binet appeared 
slower. Running an overflow experiment on n = 2 17 , Golden was faster than 
Alternate, and the linear Tumble was orders slower. 

It is difficult to do timing on fast modern multitasking systems. For 
instance at the time of writing, Java System. currentTimeMillis() is useless 
for these algorithms with resolution of 20 ms (run on a sigle CPU Windows 
2000 system). Timing in Java proved to be problematic at best. A native 
timer was used that gave resolution to about .004 ms, when background noise 
was low. However, repeat times varied to the extent that we do not find the 
method reliable for other than broad conclusions. Repeated runs indicated 
that Golden was maybe faster at F 40 and that iteration was maybe slower 
at F 92 - Although average run times of programs can be measured, the Java 
optimization methods make Java unsuitable for experimental evaluation of 
algorithms because of run optimization. 

The algorithms were recoded in C and timing experiments were run. 
Again it was difficult to measure results. At n = 92 with 164 integers, 
Alternating was a bit slower than Tumble. However, Alternating using 32 
long was a bit faster. In overflow, Golden was a bit faster than all oth- 
ers. For n = 2 10 — 1 (the floating point overflow limit for Golden), Tumble 
took somewhat longer than Alternating, which took about twice as long as 
Golden(&n unfair comparsion). On modern systems, multiplication is faster 
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than the theoretical assumptions in [4] and [5] and, of course, extended arith- 
metic has additional execution costs; therefore, experimental verification of 
theoretical results can be difficult. The more serious conclusion is that a the- 
ory not based on modern systems may have difficulty predicting execution 
performance. 

5. Conclusions 

We were able to estimate the largest Fibonacci number that can be rep- 
resented in a given finite storage. This was not done in previous papers. 
For 64 long n max = 91 closely agreed with F nmax=92 for integer programs. 
For 64 bit floating n max = 76 closely agreed with F nmax=li for Golden. The 
float estimate errors are higher because of the truncated representation of 
irrationals. 

Golden has the best iteration multiplication costs of 0(2 lgn), 0(1.5 lgn) 
and fi(llgn). Golden and Dgolden provide the most simple constructive 
proofs that Fibonacci numbers can be found in O(lgn) time. Their practical 
limitations result from errors accumulating from finite representation of ir- 
rationals, the requirement for floating point, and by full multiplication costs 
because of the irrational constants. However, by increasing word size ap- 
propiately, they can always give F n . Even assumming the efficient squaring 
of [5], S(n)= 2/3M(n), gives a multiplication cost of 7/6 for Golden compared 
to 8/6 for [5]. This leads to the surprising result that, given sufficient storage, 
and assuming multiplications have storage size, Golden is the fastest. 

Alternating compared to [5] does not require the introduction of Lucas 
numbers and is much simpler. An orginal aspect is the use of four sequence 
terms generated by only two multiplications, unlike matrix based equations 
that use only three sequence terms but require three multiplications. It is 
0(2 lg n) , (2 lg n) , and is as fast and more practical than [5] for computation. 

Several algorithms were encoded in Java and some execution results were 
obtained. Fg 2 was the largest number found for each integer program, when 
using 64 bit Java long. F 74 was the largest number found for Golden, when 
using 64 bit floating point. Time comparisons for Golden were limited by 
n = 74, where it appeared faster than other methods. 

The run times in [4] begin at F n=2 8, which requires about 178 integer 
bits, and are not valid for measuring the effect of multiplication cost as 
intended. Although the calculated values appeared in agreement with run 
times [4], these theoretical values were based on the run time of a large input, 
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contaminating the theory results. Run times for our integer programs were 
limited by n = 92, when using Java 64 bit long. The timer could not measure 
any real difference, but the log programs were slightly faster than the linear. 

With present day computers, arithmetic operations are very efficient and 
we may assume a calculation model where arithmetic operations and assign- 
ment have equal cost. Control statements are the more complicated oper- 
ations as found in our experiments. A simple compution model would be 
to count each operation (keyword) as a cost of one including if, then, else, 
while, endWhile as a one cost. In other words, a key work has a cost of one. 
This is a reasonable computation model for comparing algorithms. 

An important measure of the complexity of an algorithm is readability. 
Human readability is enhanced by shorter code and the reduction of ifthenelse 
structures that interrupt sequential flow. So the algorithm representation is 
part of the overall efficienty of an algorithm and reduces errors when being 
implemented. Our design rule is as simple as possible as complex as necessary. 
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